Skip to content

in_process_exporter_metrics: Expire metrics for dead processes#11760

Open
piwai wants to merge 6 commits intofluent:masterfrom
piwai:fix-issue-9547
Open

in_process_exporter_metrics: Expire metrics for dead processes#11760
piwai wants to merge 6 commits intofluent:masterfrom
piwai:fix-issue-9547

Conversation

@piwai
Copy link
Copy Markdown
Contributor

@piwai piwai commented Apr 29, 2026

This add an expiration mechanism for dead processes, avoiding publishing prometheus metrics for process which are not running anymore on the host.
Uses 2 hash tables to store process and threads id, to be able to expire metrics which are not present in these tables.

Fixes #9547

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes

    • Automatically removes stale process and thread metrics when entities are no longer active, improving metric accuracy.
    • Skips removals and logs a warning when activity detection is incomplete to prevent unsafe deletions.
    • Corrects thread context-switch metric labeling to use the proper process name.
  • Improvements

    • Finer-grained per-process and per-thread activity tracking via composite identity keys.
    • New label utilities and gated purging ensure orphaned metrics are removed only when tracking is complete.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 462fee66-c6df-4b03-89d4-a52458a97167

📥 Commits

Reviewing files that changed from the base of the PR and between 618cab2 and 0193ad4.

📒 Files selected for processing (1)
  • plugins/in_process_exporter_metrics/pe_process.c
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/in_process_exporter_metrics/pe_process.c

📝 Walkthrough

Walkthrough

Records composite identity keys for active processes and threads during /proc scans and purges stale cmetric entries whose label-derived keys are not present in the active sets. Purging runs only when active-index tracking completes; active tables are freed on teardown.

Changes

Cohort / File(s) Summary
Stale metric cleanup & active-ID tracking
plugins/in_process_exporter_metrics/pe_process.c
Adds active_pids and active_tids hash tables populated during /proc scans with composite keys (`name

Sequence Diagram(s)

sequenceDiagram
    participant Scanner as Process Scanner
    participant ThreadUpd as Thread Updater
    participant Cmetrics as Cmetrics Manager
    participant Logger as Logger

    rect rgba(200,200,255,0.5)
    Scanner->>Scanner: scan /proc, build active_pids (name|pid, name|pid|ppid)
    end

    rect rgba(200,255,200,0.5)
    Scanner->>ThreadUpd: call process_thread_update(..., active_tids, &active_index_complete)
    ThreadUpd->>ThreadUpd: discover threads, insert name|threadname|tid into active_tids
    end

    rect rgba(255,200,200,0.5)
    Scanner->>Cmetrics: call purge_stale_metrics(active_pids, active_tids, active_index_complete)
    alt active_index_complete == true
        Cmetrics->>Cmetrics: iterate metrics, extract label values, compare to active sets
        Cmetrics->>Cmetrics: remove stale metric entries
        Cmetrics->>Logger: log removals
    else active_index_complete == false
        Cmetrics->>Logger: emit warning and skip purge
    end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

backport to v4.2.x

Suggested reviewers

  • edsiper

Poem

🐰 I hopped through /proc at dawn's light,

counted names and IDs tidy and neat,
swept stale metrics from the meadow bright,
kept only active footprints beneath my feet,
then munched a carrot in gentle retreat.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding metric expiration for dead processes, which directly addresses the core objective of the PR.
Linked Issues check ✅ Passed The implementation adds composite identity tracking and metric purging for inactive processes/threads, directly addressing issue #9547's requirement to expire metrics for dead processes and prevent unbounded memory/CPU growth.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing the metric expiration mechanism for dead processes. The thread label fix for context-switch metrics is a necessary correction to support the identity tracking.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@piwai
Copy link
Copy Markdown
Contributor Author

piwai commented Apr 29, 2026

Here's the request valgrind log
valgrind.log

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99f12369f0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread plugins/in_process_exporter_metrics/pe_process.c Outdated
Comment thread plugins/in_process_exporter_metrics/pe_process.c Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/in_process_exporter_metrics/pe_process.c`:
- Around line 946-967: The purge_stale_metrics function currently checks only
PID/TID via get_metric_label_value against active_ids, which allows stale series
to survive on ID reuse; update purge_stale_metrics to build a collision-free key
for each metric (for example by concatenating the full label tuple or PID/TID
plus process start time) and use that key when querying flb_hash_table_get
instead of the single id_val so the exact emitted identity is checked before
calling cmt_map_metric_destroy; apply the same change to the equivalent logic
referenced around the other purge block (the similar code at the later section)
so both places use the full-label or start-time-based key.
- Line 524: flb_hash_table_add calls that populate
active_tids/active_pids/active_fds must not be ignored — check each
flb_hash_table_add return value and treat a failure as a fatal error for this
collection pass: on add failure, log a descriptive error (including the
tid/pid/fd value), set a flag like index_complete = false (or return an error
code) and skip the end-of-pass purge; alternatively return early from the
collection function so purge code does not run. Update every insertion site that
uses flb_hash_table_add (references: active_tids, active_pids, active_fds) to
perform this check and ensure the purge path is gated by index_complete or
skipped when an insertion failed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4c226035-2a3c-4f70-b7f7-44e2c70a3119

📥 Commits

Reviewing files that changed from the base of the PR and between 230eb3c and 99f1236.

📒 Files selected for processing (1)
  • plugins/in_process_exporter_metrics/pe_process.c

Comment thread plugins/in_process_exporter_metrics/pe_process.c Outdated
Comment thread plugins/in_process_exporter_metrics/pe_process.c Outdated
@edsiper
Copy link
Copy Markdown
Member

edsiper commented Apr 30, 2026

@piwai thanks for this contribution. Pls check the reviews provided by the Ai agent

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
plugins/in_process_exporter_metrics/pe_process.c (1)

526-533: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

name:id is still not a collision-free liveness key.

The purge now matches on name + pid/tid, but several maps distinguish series by more than that (ppid for process series, threadname for thread series). If an ID is reused for another process/thread with the same name before the next scrape, the lookup still hits and the old ppid/threadname series survives. That means stale series can still accumulate under reuse-heavy workloads.

Please purge against the exact emitted identity for each map, or another collision-free identity such as start time.

Also applies to: 955-983, 1099-1106, 1267-1286

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/in_process_exporter_metrics/pe_process.c` around lines 526 - 533, The
current liveness key built in the active_tids purge (created via
flb_sds_create(name) + ":" + tid_str and added with
flb_hash_table_add(active_tids, active_key, ...)) is not collision-free because
it omits other identity fields (e.g., ppid for process maps, threadname for
thread maps); update the key construction wherever you see this pattern (the
active_key + flb_hash_table_add calls in this file) to include the exact emitted
identity used when the series is created—either append ppid or threadname (or
the process start time) to the key so the purge matches the full emitted
identity, and apply the same fix to the other similar blocks that build
active_key before calling flb_hash_table_add.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/in_process_exporter_metrics/pe_process.c`:
- Around line 1244-1247: When process_thread_update(ctx, ts, pid_str, name,
active_tids, &active_index_complete) returns -1 you must mark the active-index
gate invalid so the purge doesn't remove live thread metrics; update the caller
to set active_index_complete = false (or otherwise invalidate the purge gate)
immediately in the error branch where ret == -1. Apply the same change to the
other failing path referenced (the similar block around the 1265-1289 range) so
both failure paths for process_thread_update() invalidate active_index_complete
when thread enumeration fails.

---

Duplicate comments:
In `@plugins/in_process_exporter_metrics/pe_process.c`:
- Around line 526-533: The current liveness key built in the active_tids purge
(created via flb_sds_create(name) + ":" + tid_str and added with
flb_hash_table_add(active_tids, active_key, ...)) is not collision-free because
it omits other identity fields (e.g., ppid for process maps, threadname for
thread maps); update the key construction wherever you see this pattern (the
active_key + flb_hash_table_add calls in this file) to include the exact emitted
identity used when the series is created—either append ppid or threadname (or
the process start time) to the key so the purge matches the full emitted
identity, and apply the same fix to the other similar blocks that build
active_key before calling flb_hash_table_add.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6fb43019-e2d2-4ca2-b95c-1fa47e7631c9

📥 Commits

Reviewing files that changed from the base of the PR and between 99f1236 and 2703513.

📒 Files selected for processing (1)
  • plugins/in_process_exporter_metrics/pe_process.c

Comment thread plugins/in_process_exporter_metrics/pe_process.c
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/in_process_exporter_metrics/pe_process.c`:
- Around line 526-533: The SDS key construction code using flb_sds_create and
flb_sds_cat_safe must check for allocation/appending failure before calling
flb_sds_len or flb_hash_table_add; update the block that builds active_key to
validate that flb_sds_create returned non-NULL and that each flb_sds_cat_safe
succeeded, and if any step fails set *active_index_complete = FLB_FALSE, avoid
calling flb_sds_len or flb_hash_table_add with a NULL key, free/destroy any
partially created SDS via flb_sds_destroy if non-NULL, and return/continue as
appropriate; apply the same defensive checks to the other SDS key construction
sites (the analogous blocks around lines 976-983 and 1099-1106) to prevent NULL
dereference in flb_sds_len or hash operations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fbc9fa0c-f17d-49f5-8780-04c410d91e63

📥 Commits

Reviewing files that changed from the base of the PR and between 2703513 and 5088c86.

📒 Files selected for processing (1)
  • plugins/in_process_exporter_metrics/pe_process.c

Comment thread plugins/in_process_exporter_metrics/pe_process.c Outdated
@piwai
Copy link
Copy Markdown
Contributor Author

piwai commented Apr 30, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/in_process_exporter_metrics/pe_process.c (1)

1151-1158: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Invalidate the purge index when ppid parsing fails.

By this point the name|pid key has already been inserted, but the name|pid|ppid key has not. If Line 1153 or Line 1158 jumps to cleanup, active_pids is partial while active_index_complete stays true, so the final purge can delete live metrics for every map keyed by {name,pid,ppid}.

Suggested fix
             tmp = strstr(entry->str, ")");
             if (tmp == NULL) {
+                active_index_complete = FLB_FALSE;
                 goto cleanup;
             }

             ret = flb_slist_split_string(&split_list, tmp+2, ' ', -1);
             if (ret == -1) {
+                active_index_complete = FLB_FALSE;
                 goto cleanup;
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/in_process_exporter_metrics/pe_process.c` around lines 1151 - 1158,
Parsing failures for ppid (when tmp == NULL or flb_slist_split_string returns
-1) currently jump to cleanup without invalidating the purge index, leaving
active_pids partial while active_index_complete remains true; update the error
paths in the ppid parsing block inside pe_process.c (the section using
entry->str, tmp, and flb_slist_split_string(&split_list,...)) so that before
jumping to the cleanup label you set active_index_complete = false (and
optionally clear any partial entries in active_pids if present) to prevent
premature purging of live {name,pid,ppid}-keyed metrics.
♻️ Duplicate comments (2)
plugins/in_process_exporter_metrics/pe_process.c (2)

1301-1304: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Invalidate the purge gate when thread scanning aborts.

If process_thread_update() returns -1, active_tids is incomplete for that process but active_index_complete remains true. A transient /proc/<pid>/task failure can then make the end-of-pass purge remove live thread metrics.

Suggested fix
                 ret = process_thread_update(ctx, ts, pid_str, name, active_tids, &active_index_complete);
                 if (ret == -1) {
+                    active_index_complete = FLB_FALSE;
                     flb_plg_debug(ctx->ins, "collect thread procfs is failed on the pid = %s", pid_str);
                 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/in_process_exporter_metrics/pe_process.c` around lines 1301 - 1304,
When process_thread_update(ctx, ts, pid_str, name, active_tids,
&active_index_complete) returns -1, mark the purge gate as invalid so incomplete
active_tids cannot trigger end-of-pass purges; specifically, in the error branch
where ret == -1 set active_index_complete = false (or otherwise invalidate the
purge condition) for that pid and ensure subsequent logic treats that process as
incomplete so live thread metrics are not purged prematurely.

544-555: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The |-joined identity keys are still collision-prone.

These tables now mirror the same plain-text encoding on insert and purge, but the encoding is not injective: active_pids mixes name|pid and name|pid|ppid, and process/thread names can legally contain |. Distinct label tuples can therefore collapse to the same hash key, which keeps stale series alive or purges the wrong series. Please switch both builders and lookups to a collision-free encoding instead of delimiter-joined strings.

Also applies to: 970-1016, 1131-1145, 1170-1185

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/in_process_exporter_metrics/pe_process.c` around lines 544 - 555, The
current "|"‑joined keys (built via flb_sds_create + flb_sds_cat_safe and
inserted with flb_hash_table_add) are not injective because names can contain
"|" and some tables mix different tuple arities; replace this with a
collision-free encoding (e.g., length-prefixed fields or a binary composite key)
for both builders and lookups: change the code that builds active_key
(references: active_key, flb_sds_create, flb_sds_cat_safe) and the hash table
inserts/queries (references: flb_hash_table_add, active_tids, active_pids) to
produce a unique binary key (field lengths + data or a fixed-struct key) and
update all corresponding lookups/purges (also modify the same pattern at the
other occurrences noted) so key creation and key comparison are consistent and
no delimiter ambiguity remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/in_process_exporter_metrics/pe_process.c`:
- Around line 544-554: The key-construction error path leaks the buffer returned
by get_name() (thread_name) because the early continue skips cleanup; modify the
failure branch in the active_key build (the if that sets *active_index_complete
= FLB_FALSE) to free thread_name (and tid_str if it was allocated similarly)
before calling flb_sds_destroy(active_key) and continue. Locate the code around
active_key creation/concatenation (the flb_sds_create and flb_sds_cat_safe
calls) and insert the appropriate deallocation (e.g., flb_free(thread_name) or
free(thread_name) consistent with how get_name() allocates) just prior to the
continue so no leak remains.

---

Outside diff comments:
In `@plugins/in_process_exporter_metrics/pe_process.c`:
- Around line 1151-1158: Parsing failures for ppid (when tmp == NULL or
flb_slist_split_string returns -1) currently jump to cleanup without
invalidating the purge index, leaving active_pids partial while
active_index_complete remains true; update the error paths in the ppid parsing
block inside pe_process.c (the section using entry->str, tmp, and
flb_slist_split_string(&split_list,...)) so that before jumping to the cleanup
label you set active_index_complete = false (and optionally clear any partial
entries in active_pids if present) to prevent premature purging of live
{name,pid,ppid}-keyed metrics.

---

Duplicate comments:
In `@plugins/in_process_exporter_metrics/pe_process.c`:
- Around line 1301-1304: When process_thread_update(ctx, ts, pid_str, name,
active_tids, &active_index_complete) returns -1, mark the purge gate as invalid
so incomplete active_tids cannot trigger end-of-pass purges; specifically, in
the error branch where ret == -1 set active_index_complete = false (or otherwise
invalidate the purge condition) for that pid and ensure subsequent logic treats
that process as incomplete so live thread metrics are not purged prematurely.
- Around line 544-555: The current "|"‑joined keys (built via flb_sds_create +
flb_sds_cat_safe and inserted with flb_hash_table_add) are not injective because
names can contain "|" and some tables mix different tuple arities; replace this
with a collision-free encoding (e.g., length-prefixed fields or a binary
composite key) for both builders and lookups: change the code that builds
active_key (references: active_key, flb_sds_create, flb_sds_cat_safe) and the
hash table inserts/queries (references: flb_hash_table_add, active_tids,
active_pids) to produce a unique binary key (field lengths + data or a
fixed-struct key) and update all corresponding lookups/purges (also modify the
same pattern at the other occurrences noted) so key creation and key comparison
are consistent and no delimiter ambiguity remains.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4acaf5ca-5287-46fe-bffe-a9a850369e8c

📥 Commits

Reviewing files that changed from the base of the PR and between b7214d3 and 618cab2.

📒 Files selected for processing (1)
  • plugins/in_process_exporter_metrics/pe_process.c

Comment thread plugins/in_process_exporter_metrics/pe_process.c
piwai added 6 commits April 30, 2026 17:55
Signed-off-by: Pierre-Yves Rofes <3604235+piwai@users.noreply.github.com>
Signed-off-by: Pierre-Yves Rofes <3604235+piwai@users.noreply.github.com>
Signed-off-by: Pierre-Yves Rofes <3604235+piwai@users.noreply.github.com>
Signed-off-by: Pierre-Yves Rofes <3604235+piwai@users.noreply.github.com>
Signed-off-by: Pierre-Yves Rofes <3604235+piwai@users.noreply.github.com>
Signed-off-by: Pierre-Yves Rofes <3604235+piwai@users.noreply.github.com>
@piwai
Copy link
Copy Markdown
Contributor Author

piwai commented Apr 30, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@piwai
Copy link
Copy Markdown
Contributor Author

piwai commented Apr 30, 2026

@edsiper Took some time, but I think I addressed all issues! For the hash table key construction using "|" characters, I consider that the proposed fix with fixed-length and NUL bytes would add extra complexity for very little benefit, as the collision is very unlikely and even if it happens, it will be purged eventually in case of high load.

@piwai thanks for this contribution. Pls check the reviews provided by the Ai agent

Copy link
Copy Markdown
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really interesting and prominent idea!
But we don't want to touch the internal structure of cmetrics from Fluent Bit's plugins as much as possible. So, could you try to implement this kind of feature inside of cmetrics if possible?
We're also seeking an opportunity to purge stale metrics in some ways but currently not succeeded yet.

@piwai
Copy link
Copy Markdown
Contributor Author

piwai commented Apr 30, 2026

@cosmo0920 thanks a lot for the review!
Just to be sure I've understood correctly, and to avoid other rounds of rework, the new implementation could be something like: removing the purge_stale_metric() function in ne_process, and having a more generic function in lib/cmetrics/ like "cmt_expire()", which would manipulate the cmetrics structs and maybe have some kind of callback parameter for the expiration criteria, which would need to be implemented per module (e.g for ne_process would be checking if they are still alive)?
Other question, do you have any example of other locations were metrics are not expired?

@cosmo0920
Copy link
Copy Markdown
Contributor

cosmo0920 commented May 1, 2026

Just to be sure I've understood correctly, and to avoid other rounds of rework, the new implementation could be something like: removing the purge_stale_metric() function in ne_process, and having a more generic function in lib/cmetrics/ like "cmt_expire()", which would manipulate the cmetrics structs and maybe have some kind of callback parameter for the expiration criteria, which would need to be implemented per module (e.g for ne_process would be checking if they are still alive)?

Yup, implementing cmt_expire() or similar function to provide expiring metrics.

Other question, do you have any example of other locations were metrics are not expired?

We're also suffering for out of date issue of stale metrics in out_prometheus_remote_write or out_prometheus_expoter.
So, we really wanted to mitigate this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

process_exporter input plugin does not handle dead processes

3 participants